Near-Optimal Entrywise Sampling for Data Matrices

نویسندگان

  • Dimitris Achlioptas
  • Zohar S. Karnin
  • Edo Liberty
چکیده

We consider the problem of selecting non-zero entries of a matrix A in order to produce a sparse sketch of it, B, that minimizes A B 2. For large m n matrices, such that n m (for example, representing n observations over m attributes) we give sampling distributions that exhibit four important properties. First, they have closed forms computable from minimal information regarding A. Second, they allow sketching of matrices whose non-zeros are presented to the algorithm in arbitrary order as a stream, with O 1 computation per non-zero. Third, the resulting sketch matrices are not only sparse, but their non-zero entries are highly compressible. Lastly, and most importantly, under mild assumptions, our distributions are provably competitive with the optimal offline distribution. Note that the probabilities in the optimal offline distribution may be complex functions of all the entries in the matrix. Therefore, regardless of computational complexity, the optimal distribution might be impossible to compute in the streaming model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise

We consider the matrix completion problem under a form of row/column weighted entrywise sampling, including the case of uniform entrywise sampling as a special case. We analyze the associated random observation operator, and prove that with high probability, it satisfies a form of restricted strong convexity with respect to weighted Frobenius norm. Using this property, we obtain as corollaries ...

متن کامل

Spectral Method and Regularized MLE Are Both Optimal for Top-$K$ Ranking

This paper is concerned with the problem of top-K ranking from pairwise comparisons. Given a collection of n items and a few pairwise binary comparisons across them, one wishes to identify the set of K items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model—the Bradley-Terry-Luce model, where each item is assigned a latent preference score, and where...

متن کامل

Near-optimal Distributions for Data Matrix Sampling

We give near-optimal distributions for the sparsification of large m n matrices, where m ! n, for example representing n observations over m attributes. Our algorithms can be applied when the non-zero entries are only available as a stream, i.e., in arbitrary order, and result in matrices which are not only sparse, but whose values are also highly compressible. In particular, algebraic operatio...

متن کامل

Iterative Methods for Detecting Semipositive Matrices

A Matrix A ∈ Rn×n is said to be semipositive if there exists positive x ∈ R such that Ax is positive. Semipositivity generalizes several of the notions of positivity of a matrix, including entrywise positive matrices, diagonally dominant matrices with positive diagonal elements, and P-matrices. Here, we illustrate the geometric nature of the semipositivity property, list some basic facts about ...

متن کامل

Functions Preserving Nonnegativity of Matrices

The main goal of this work is to determine which entire functions preserve nonnegativity of matrices of a fixed order n— i.e., to characterize entire functions f with the property that f(A) is entrywise nonnegative for every entrywise nonnegative matrix A of size n×n. Towards this goal, we present a complete characterization of functions preserving nonnegativity of (block) uppertriangular matri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013